Search CORE

68 research outputs found

Is Self-Supervised Pretraining Good for Extrapolation in Molecular Property Prediction?

Author: Hanai Masatoshi
Suzumura Toyotaro
Takashige Shun
Taura Kenjiro
Wang Limin
Publication venue
Publication date: 15/08/2023
Field of study

The prediction of material properties plays a crucial role in the development and discovery of materials in diverse applications, such as batteries, semiconductors, catalysts, and pharmaceuticals. Recently, there has been a growing interest in employing data-driven approaches by using machine learning technologies, in combination with conventional theoretical calculations. In material science, the prediction of unobserved values, commonly referred to as extrapolation, is particularly critical for property prediction as it enables researchers to gain insight into materials beyond the limits of available data. However, even with the recent advancements in powerful machine learning models, accurate extrapolation is still widely recognized as a significantly challenging problem. On the other hand, self-supervised pretraining is a machine learning technique where a model is first trained on unlabeled data using relatively simple pretext tasks before being trained on labeled data for target tasks. As self-supervised pretraining can effectively utilize material data without observed property values, it has the potential to improve the model's extrapolation ability. In this paper, we clarify how such self-supervised pretraining can enhance extrapolation performance.We propose an experimental framework for the demonstration and empirically reveal that while models were unable to accurately extrapolate absolute property values, self-supervised pretraining enables them to learn relative tendencies of unobserved property values and improve extrapolation performance

arXiv.org e-Print Archive

On Data Imbalance in Molecular Property Prediction with Pre-training

Author: Hanai Masatoshi
Suzumura Toyotaro
Takashige Shun
Taura Kenjiro
Wang Limin
Publication venue
Publication date: 17/08/2023
Field of study

Revealing and analyzing the various properties of materials is an essential and critical issue in the development of materials, including batteries, semiconductors, catalysts, and pharmaceuticals. Traditionally, these properties have been determined through theoretical calculations and simulations. However, it is not practical to perform such calculations on every single candidate material. Recently, a combination method of the theoretical calculation and machine learning has emerged, that involves training machine learning models on a subset of theoretical calculation results to construct a surrogate model that can be applied to the remaining materials. On the other hand, a technique called pre-training is used to improve the accuracy of machine learning models. Pre-training involves training the model on pretext task, which is different from the target task, before training the model on the target task. This process aims to extract the input data features, stabilizing the learning process and improving its accuracy. However, in the case of molecular property prediction, there is a strong imbalance in the distribution of input data and features, which may lead to biased learning towards frequently occurring data during pre-training. In this study, we propose an effective pre-training method that addresses the imbalance in input data. We aim to improve the final accuracy by modifying the loss function of the existing representative pre-training method, node masking, to compensate the imbalance. We have investigated and assessed the impact of our proposed imbalance compensation on pre-training and the final prediction accuracy through experiments and evaluations using benchmark of molecular property prediction models

arXiv.org e-Print Archive

Argobots: A Lightweight Low-Level Threading and Tasking Framework

Author: Amer Abdelhalim
Balaji Pavan
Beckman Pete
Bordage Cyril
Bosilca George
Brooks Alex
Carns Philip
Castelló Adrián
Genet Damien
Herault Thomas
Iwasaki Shintaro
Jindal Prateek
Kalé Laxmikant V.
Krishnamoorthy Sriram
Lifflander Jonathan
Lu Huiwei
Meneses Esteban
Seo Sangmin
Snir Marc
Sun Yanhua
Taura Kenjiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

INRIA a CCSD electronic archive server

Repositori Institucional de la Universitat Jaume I

Schematic: A Concurrent Object-Oriented Extension to Scheme

Author: Akinori Yonezawa
Akinori Yonezawa
Kenjiro Taura
Kenjiro Taura
Publication venue
Publication date: 01/01/1996
Field of study

A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of fundamental primitives. In this way, Schematic achieves both the convenience for typical concurrent programming and simplicity and flexibility of the language kernel. Schematic also supports concurrent objects which exhibit more natural and intuitive behavior than the "bare" (unprotected) shared memory, and permit more concurrency than the traditional Actor model. Schematic will be useful for intensive parallel applications on parallel machines or networks of workstations, concurrent GUI programming, distributed programming over network, and even concurrent shell programming. ANY OTHER IDENTIFYING INFORMATION OF THIS REPORT To Appear in Proceedings of Object Based Parallel and Distributed Co..

CiteSeerX

Efficient and Reusable Implementantion of Fine-Grain Multithreading and Garbage Collection on Distributed-Memory Parallel Computers

Author: Taura Kenjiro
Publication venue
Publication date: 22/09/1997
Field of study

報告番号: 乙13537 ; 学位授与年月日: 1997-09-22 ; 学位の種別: 論文博士 ; 学位の種類: 博士(理学) ; 学位記番号: 第13537号 ; 研究科・専攻: 理学系研究

UT Repository

*+.61-/4;@??d7930:58h2icabgjlafe=k

Author: By Kenjiro
Kenjiro Taura
Publication venue
Publication date
Field of study

This paper discusses a broad range of issues which make concurrent object-oriented programming (COOP) on distributed memory multicomputers more comfortable and efficient. The presented topics include language design, abstract machine, memory management (including garbage collection). (1) As for the language design, a variant of future is introduced so that programmers can easily specify parallelism and synchronization between them. The language is made explicitly parallel to give programmers a simple cost-model and opportunity of manual optimization, thereby achieving performance without imposing too much burden on the compiler. The expressive power of the language is demonstrated by typical synchronization codes as well as example applications. (2) Runtime implementation issues are described in terms of our proposed abstract machine, StackThreads, whereby making proposed mechanisms applicable not only to COOP languages but also to other languages such as functional languages. In Stac..

CiteSeerX

Efficient and Reusable Implementation of Fine-Grain Multithreading and Garbage Collection on Distributed-Memory Parallel Computers

Author: Taura Kenjiro
Publication venue
Publication date
Field of study

Institutional Repositories DataBase (IRDB)

Efficient and Reusable Implementation of Fine-Grain Multithreading and Garbage Collection on Distributed-Memory Parallel Computers

Author: Kenjiro Taura
Publication venue
Publication date
Field of study

This thesis studies efficient runtime systems for parallelism management (multithreading) and memory management (garbage collection) on largescale distributed-memory parallel computers. Both are fundamental primitives for implementing high-level parallel programming languages that support dynamic parallelism and dynamic data structures. A distinguishing feature of the developed multithreading system is that it tolerates a large number of threads in a single CPU while allowing direct reuse of existing sequential C compilers. In fact, it is able to turn any standard C procedure call into an asynchronous one. Having such a runtime system, the compiler of a high-level parallel programming language can fork a new thread simply by a C procedure call to a corresponding C function. A thread can block its execution by calling a library procedure that saves the stack frame of the thread and unwinds stack frames. To resume a thread, StackThreads provides another runtime routine that rebuilds the..

CiteSeerX

A Methodology for Constructing Portable and Simple Global Garbage Collectors

Author: Akinori Yonezawa
Kenjiro Taura
Toshio Endo
Toshio Endo. Kenjiro Taura
Publication venue
Publication date
Field of study

Many garbage collectors on parallel computers are written in sequential languages, therefore thay are not portable across machines with different communication primitives. Moreover, the description of garbage collectors on distributed memory machines, which use asynchronous messages, is complex. We implemented a garbage collector for parallel object-oriented language Schematic by using Schematic itself. We show that a garbage collector can be more portable and simple by describing it on the top of parallel language, which is machine-independent and equipped with high level communication constructs. We implemented a garbage collector on distrubuted memory machine AP1000, and measured its performance. 1 Introduction One of the difficult factors in constructing parallel languages is implementing global garbage collectors (garbage collectors (GC) which detect garbages which have been shared among several processors). The problems which occur in implementing garbage collectors on parallel ..

CiteSeerX